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ABSTRACT 

Summary: Community curation— harnessing community intelligence 
in knowledge curation, bears great promise in dealing with the flood of 
biological knowledge. To exploit the full potential of the scientific com- 
munity for knowledge curation, multiple biological wikis (bio-wikis) 
have been built to date. However, none of them have achieved a 
substantial impact on knowledge curation. One of the major limitations 
in bio-wikis is insufficient community participation, which is intrinsically 
because of lack of explicit authorship and thus no credit for commu- 
nity curation. To increase community curation in bio-wikis, here we 
develop AuthorReward, an extension to MediaWiki, to reward com- 
munity-curated efforts in knowledge curation. AuthorReward quanti- 
fies researchers' contributions by properly factoring both edit quantity 
and quality and yields automated explicit authorship according to their 
quantitative contributions. AuthorReward provides bio-wikis with an 
authorship metric, helpful to increase community participation in 
bio-wikis and to achieve community curation of massive biological 
knowledge. 
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Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

Biological knowledge is generated at ever-faster rates and dis- 
persed among researchers and across literatures. As each new 
biological study has become increasingly dependent on the avail- 
ability of existing knowledge, comprehensive and up-to-date col- 
lection of biological knowledge across a wide variety of research 
fields is of critical significance in life sciences (Clark, 2007). 

Traditionally, biological knowledge has been aggregated 
through expert curation, conducted manually by dedicated ex- 
perts. However, with the burgeoning volume of biological data 
and increasingly diverse densely informative published litera- 
tures, expert curation becomes more and more laborious and 
time consuming, increasingly lagging behind knowledge creation. 
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Accordingly, community curation — harnessing community intel- 
ligence for knowledge curation — has gained significant attention 
as a solution to this issue (Salzberg, 2007; Waldrop, 2008; Zhang 
et ah, 2011). A successful example that engages community in- 
telligence in knowledge aggregation is Wikipedia that features 
up-to-date content, huge coverage and low cost for maintenance. 
Spirited by the extraordinary success of Wikipedia, multiple bio- 
logical wikis (bio-wikis) have been built to date (Supplementary 
Table SI). 

However, bio-wikis have not achieved a substantial impact on 
community curation of biological knowledge (Finn et a/., 2012). 
One of the major limitations in bio-wikis is insufficient partici- 
pation from the scientific community, which is intrinsically be- 
cause of lack of explicit authorship and thus no credit for 
community-curated contributions (Finn et ah, 2012; Howe 
et ah, 2008). A valuable attempt has been made to motivate 
community contributions in wikis by means of social rewarding 
techniques (Hoisl et ah, 2007), but it does not provide explicit 
authorship for any wiki page. Although authorship has been 
introduced in a non-MediaWiki-based system (Hoffmann, 
2008), it only links every sentence to its author but does not 
provide a quantitative measure of authorship, and most import- 
ant, it is inapplicable to extant bio-wikis that are largely built on 
MediaWiki (a free, open source and widely used wiki engine, 
which is adopted by Wikipedia). Several initiatives based on se- 
mantic web technologies have already emerged for biological 
knowledge management (Antezana et ah, 2009). However, they 
do not promise to manage or quantify authorship of the free text 
in bio-wikis. To increase community curation in bio-wikis, here 
we develop AuthorReward, an extension to MediaWiki, to 
reward community-curated efforts in bio-wikis by contribution 
quantification and explicit authorship. 



2 ALGORITHMS 

MediaWiki allows anyone to develop customized functionalities 
by packaging a bunch of codes as MediaWiki extensions. Thus, 
AuthorReward is implemented as an extension to MediaWiki. 
Although MediaWiki itself includes an infrastructure for individ- 
ual contributions to be recognized, it only records the revision 
history and provides no explicit authorship. 
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A wiki page contains a collection of knowledge on a specific 
subject, where multiple researchers are most likely to collabora- 
tively provide edits. AuthorReward aims to provide a viable 
quantification for researchers' contributions in bio-wikis. A 
major concern to automated authorship has been ensuring that 
authorship cannot be 'manipulated' by spurious, short-lived edits 
(Supplementary Text SI). For any wiki page p, we assume there 
are a series of edit versions v n , vj, v 2 , . . ., v„, where version v 0 is 
empty and n>0. AuthorReward counts multiple successive ver- 
sions edited by a researcher as one version. Thus, any neighbor- 
ing versions, v,-_ i and v, (where 1 < i< n), are edited by different 
researchers. The edit distance between v, and Vj, termed as d{v iy vj) 
(where i<j), is computed by the Levenshtein distance (LD) 
(Levenshtein, 1966) that measures the minimum number of 
edit operations (insertions, deletions and substitutions) required 
to transform one string into the other. In AuthorReward, the 
contribution score of version v,-, CS(v,), is formulated straightfor- 
wardly as 



CS(vi) = c[a\Vi-i, v„) - d(vu v n Jl, 



(1) 



where c is the scale factor, rf(v, _ /, v„) is the edit distance between 
v,-_ | and v„ and d(v b v„) is the edit distance between v, and v„. 

In Equation (1), CS(v,) factors edit quality as well as edit 
quantity in an implicit manner; the edit quantity of version v,-, 
QTY(v,), amounts to the edit distance between v,- and its previous 
version v,_ i, viz., d(v, _ u v,) [Equation (2)], and the edit quality 
of version v,-, QAL(v,), corresponds to whether the edit persists in 
comparison with the last version v„ [Equation (3)]. 



QTY(vd = d( Vi - U vd 



QAL(\'i) 



d(vi-i,v„) - d{vj,v„) 
d(Vi-i,Vi) 



(2) 



(3) 



According to the triangle inequality, QAL(v,) ranges from —1, 
when the edits were entirely reverted, to +1, indicating that the 
edits were totally preserved in the last version. Therefore, 
QAL(v,), in other words, measures how long the edit lasts in 
the latest version; a high (or low) quality score is given for ver- 
sion v h if it is long-lived (or short-lived). Consequently, CS(v,) 
can be expressed by QTY(vi) multiplied by QAL(v,), namely, 
CS(v,) = QTY{Vj) x QAL(Vj). Thus, CS(v,) is not easily gamed, 
providing a viable quantification for researchers' contributions. 

Considering that one researcher may provide many discon- 
tinuous edits across the evolution of a wiki page, and thereby 
contribute multiple versions in one wiki page, the contribution 
score of researcher /• in page p, S(r, p), is quantified as the sum 
over all contributed versions, 



S(r,p)= £CS(v,), 



(4) 
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where E(r, p) is a set of versions contributed by researcher r in 
page p. As a consequence, the total contribution of researcher r 
in a bio-wiki is termed as the sum of multiple contribution scores 
in all participated pages, 



S(r) 



t>€P 



(5) 



where P is a set of pages in which researcher /• provides edits. 



3 APPLICATION AND FEATURES 

To test the functionality of AuthorReward, we installed it in 
RiceWiki (http://ricewiki.big.ac.cn). For testing purposes, we 
chose the semi-dwarfing gene (sdl), which is one of the most 
important genes deployed in modern rice breeding and is also 
known as the 'green revolution gene' affecting plant height of 
rice. There were nine researchers collaboratively annotating the 
sdl gene, providing 87 versions as of August 23, 2012 
(Supplementary Table S2; http://ricewiki.big.ac.cn/index.php/ 
Os01g0883800). 

As testified on the sdl gene (Supplementary Fig. SI), 
AuthorReward is capable of yielding sensible quantitative contri- 
butions and providing automated explicit authorship, consistent 
well with perceptions of all participated contributors. Moreover, 
AuthorReward features good compatibility with any MediaWiki- 
based system and simple installation, consequently possessing a 
broad scope for its application and providing a consistent 
appearance and functionality as Wikipedia. 



4 CONCLUSION 

AuthorReward provides bio-wikis with an authorship metric, fea- 
turing robust contribution quantification and automated explicit 
authorship. When contribution is appropriately quantified and 
authorship is duly rewarded, it is possible to exploit the full 
potential of the scientific community in knowledge curation. 

Although AuthorReward does not contribute directly to the 
integration of biological knowledge, it provides a standard prac- 
tice to reward community-curated efforts, which in return can 
increase community participation in bio-wikis for knowledge 
curation. Thus, our intention here is to produce an automated, 
simple and robust authorship metric and no automated measure 
will be able to gauge scientific content. AuthorReward can be 
used in combination with semantic web technologies, potentially 
promising a significant advance for harnessing community intel- 
ligence for knowledge curation. In addition, social rewarding 
techniques (e.g. peer rating) can be used together with 
AuthorReward for contribution evaluation. Moreover, it is 
likely in the long term to integrate community-curated efforts 
across multiple bio-wikis for each researcher, which accordingly 
requires close collaborations among bio-wikis and standardized 
mechanisms for individual identity recognition (e.g. OpenID at 
http : / /www.openid.net) . 

AuthorReward provides a standard practice to reward commu- 
nity-curated efforts in bio-wikis, and it is of interest to the sci- 
entific community intending to perform knowledge curation 
collectively and collaboratively in bio-wikis and also other 
domain wikis. 
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